possible action
Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM
Duong, Thang, Yang, Minglai, Zhang, Chicheng
We investigate the usage of Large Language Model (LLM) in collecting high-quality data to warm-start Reinforcement Learning (RL) algorithms for learning in some classical Markov Decision Process (MDP) environments. In this work, we focus on using LLM to generate an off-policy dataset that sufficiently covers state-actions visited by optimal policies, then later using an RL algorithm to explore the environment and improve the policy suggested by the LLM. Our algorithm, LORO, can both converge to an optimal policy and have a high sample efficiency thanks to the LLM's good starting policy. On multiple OpenAI Gym environments, such as CartPole and Pendulum, we empirically demonstrate that LORO outperforms baseline algorithms such as pure LLM-based policies, pure RL, and a naive combination of the two, achieving up to $4 \times$ the cumulative rewards of the pure RL baseline.
- North America > United States > Arizona (0.05)
- Europe > Portugal > Porto > Porto (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- (2 more...)
- Leisure & Entertainment > Games (0.68)
- Education (0.46)
- Energy (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
Agent-based Modeling meets the Capability Approach for Human Development: Simulating Homelessness Policy-making
Aguilera, Alba, Osman, Nardine, Curto, Georgina
The global rise in homelessness calls for urgent and alternative policy solutions. Non-profits and governmental organizations alert about the many challenges faced by people experiencing homelessness (PEH), which include not only the lack of shelter but also the lack of opportunities for personal development. In this context, the capability approach (CA), which underpins the United Nations Sustainable Development Goals (SDGs), provides a comprehensive framework to assess inequity in terms of real opportunities. This paper explores how the CA can be combined with agent-based modelling and reinforcement learning. The goals are: (1) implementing the CA as a Markov Decision Process (MDP), (2) building on such MDP to develop a rich decision-making model that accounts for more complex motivators of behaviour, such as values and needs, and (3) developing an agent-based simulation framework that allows to assess alternative policies aiming to expand or restore people's capabilities. The framework is developed in a real case study of health inequity and homelessness, working in collaboration with stakeholders, non-profits and domain experts. The ultimate goal of the project is to develop a novel agent-based simulation framework, rooted in the CA, which can be replicated in a diversity of social contexts to assess policies in a non-invasive way.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- (13 more...)
- Government (1.00)
- Energy (1.00)
- Health & Medicine > Consumer Health (0.93)
- Law > Statutes (0.68)
Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral
Kumar, Shivani, Jurgens, David
Moral reasoning is a complex cognitive process shaped by individual experiences and cultural contexts and presents unique challenges for computational analysis. While natural language processing (NLP) offers promising tools for studying this phenomenon, current research lacks cohesion, employing discordant datasets and tasks that examine isolated aspects of moral reasoning. We bridge this gap with UniMoral, a unified dataset integrating psychologically grounded and social-media-derived moral dilemmas annotated with labels for action choices, ethical principles, contributing factors, and consequences, alongside annotators' moral and cultural profiles. Recognizing the cultural relativity of moral reasoning, UniMoral spans six languages, Arabic, Chinese, English, Hindi, Russian, and Spanish, capturing diverse socio-cultural contexts. We demonstrate UniMoral's utility through a benchmark evaluations of three large language models (LLMs) across four tasks: action prediction, moral typology classification, factor attribution analysis, and consequence generation. Key findings reveal that while implicitly embedded moral contexts enhance the moral reasoning capability of LLMs, there remains a critical need for increasingly specialized approaches to further advance moral reasoning in these models.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Michigan (0.04)
- (12 more...)
- Education (1.00)
- Health & Medicine > Therapeutic Area (0.93)
Control-Theoretic Analysis of Shared Control Systems
Aronson, Reuben M., Short, Elaine Schaertl
Users of shared control systems change their behavior in the presence of assistance, which conflicts with assumpts about user behavior that some assistance methods make. In this paper, we propose an analysis technique to evaluate the user's experience with the assistive systems that bypasses required assumptions: we model the assistance as a dynamical system that can be analyzed using control theory techniques. We analyze the shared autonomy assistance algorithm and make several observations: we identify a problem with runaway goal confidence and propose a system adjustment to mitigate it, we demonstrate that the system inherently limits the possible actions available to the user, and we show that in a simplified setting, the effect of the assistance is to drive the system to the convex hull of the goals and, once there, add a layer of indirection between the user control and the system behavior. We conclude by discussing the possible uses of this analysis for the field.
- North America > United States > Massachusetts > Middlesex County > Medford (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
From Two-Dimensional to Three-Dimensional Environment with Q-Learning: Modeling Autonomous Navigation with Reinforcement Learning and no Libraries
Reinforcement learning (RL) algorithms have become indispensable tools in artificial intelligence, empowering agents to acquire optimal decision-making policies through interactions with their environment and feedback mechanisms. This study explores the performance of RL agents in both two-dimensional (2D) and three-dimensional (3D) environments, aiming to research the dynamics of learning across different spatial dimensions. A key aspect of this investigation is the absence of pre-made libraries for learning, with the algorithm developed exclusively through computational mathematics. The methodological framework centers on RL principles, employing a Q-learning agent class and distinct environment classes tailored to each spatial dimension. The research aims to address the question: How do reinforcement learning agents adapt and perform in environments of varying spatial dimensions, particularly in 2D and 3D settings? Through empirical analysis, the study evaluates agents' learning trajectories and adaptation processes, revealing insights into the efficacy of RL algorithms in navigating complex, multi-dimensional spaces. Reflections on the findings prompt considerations for future research, particularly in understanding the dynamics of learning in higher-dimensional environments.
- South America > Brazil > São Paulo (0.05)
- South America > Brazil > Federal District > Brasília (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
AIhub monthly digest: November 2023 – deconstructing sentiment analysis, few-shot learning for medical images, and Angry Birds structure generation
Welcome to our November 2023 monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, find out about recent events, and more. This month, we deconstruct sentiment analysis, find out about few-shot learning in medical imaging, investigate rare events, and look forward to our science communication training session at NeurIPS. In their paper The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis, Pranav Venkit, Mukund Srinath, Sanjana Gautam, Saranya Venkatraman, Vipul Gupta, Rebecca Passonneau and Shomir Wilson present a review of the sociotechnical aspects of sentiment analysis. In this interview, Pranav and Mukund tell us more about sentiment analysis, how they went about surveying the literature, and recommendations for researchers in the field. Deep learning models employed in medical imaging are limited by the lack of annotated images.
- Europe > United Kingdom (0.05)
- Europe > Spain > Galicia > A Coruña Province > A Coruña (0.05)
- Overview (0.92)
- Instructional Material (0.70)
- Health & Medicine > Diagnostic Medicine > Imaging (0.92)
- Leisure & Entertainment > Games > Computer Games (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)
An approach for automatically determining the possible actions in computer game states
Due to the great difficulty of thoroughly testing video game software by hand, it is desirable to have AI agents that can automatically explore different game functionalities. A key requirement of such agents is a model of the player actions that the agent can use to both determine the set of possible actions in different game states, as well as perform a chosen action on the game selected by the agent's policy. The typical game engines that are in use today do not offer such a model of actions, leading existing work to either require human effort to manually define the action model or imprecisely guess the possible actions. In our work, we demonstrate how program analysis is an effective solution to this problem by developing a state-of-the-art analysis for the user input handling logic present in games that can automatically model game actions with a discrete action space. Our key insight is that the possible actions of games correspond to the different execution paths that can be taken through the user input handling logic present in the game's code.
Neural Improvement Heuristics for Graph Combinatorial Optimization Problems
Garmendia, Andoni I., Ceberio, Josu, Mendiburu, Alexander
Recent advances in graph neural network architectures and increased computation power have revolutionized the field of combinatorial optimization (CO). Among the proposed models for CO problems, Neural Improvement (NI) models have been particularly successful. However, existing NI approaches are limited in their applicability to problems where crucial information is encoded in the edges, as they only consider node features and node-wise positional encodings. To overcome this limitation, we introduce a novel NI model capable of handling graph-based problems where information is encoded in the nodes, edges, or both. The presented model serves as a fundamental component for hill-climbing-based algorithms that guide the selection of neighborhood operations for each iteration. Conducted experiments demonstrate that the proposed model can recommend neighborhood operations that outperform conventional versions for the Preference Ranking Problem with a performance in the 99th percentile. We also extend the proposal to two well-known problems: the Traveling Salesman Problem and the Graph Partitioning Problem, recommending operations in the 98th and 97th percentile, respectively.
- Research Report (0.64)
- Instructional Material (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Learn What Is Possible, Then Choose What Is Best: Disentangling One-To-Many Relations in Language Through Text-based Games
Language models pre-trained on large self-supervised corpora, followed by task-specific fine-tuning has become the dominant paradigm in NLP. These pre-training datasets often have a one-to-many structure--e.g. in dialogue there are many valid responses for a given context. However, only some of these responses will be desirable in our downstream task. This raises the question of how we should train the model such that it can emulate the desirable behaviours, but not the undesirable ones. Current approaches train in a one-to-one setup--only a single target response is given for a single dialogue context--leading to models only learning to predict the average response, while ignoring the full range of possible responses. Using text-based games as a testbed, our approach, PASA, uses discrete latent variables to capture the range of different behaviours represented in our larger pre-training dataset. We then use knowledge distillation to distil the posterior probability distribution into a student model. This probability distribution is far richer than learning from only the hard targets of the dataset, and thus allows the student model to benefit from the richer range of actions the teacher model has learned. Results show up to 49% empirical improvement over the previous state-of-the-art model on the Jericho Walkthroughs dataset.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
- Research Report > Experimental Study (0.47)
- Research Report > New Finding (0.34)
- Research Report > Promising Solution (0.34)
Workflow Discovery from Dialogues in the Low Data Regime
Hattami, Amine El, Raimondo, Stefania, Laradji, Issam, Vazquez, David, Rodriguez, Pau, Pal, Chris
Text-based dialogues are now widely used to solve real-world problems. In cases where solution strategies are already known, they can sometimes be codified into workflows and used to guide humans or artificial agents through the task of helping clients. We introduce a new problem formulation that we call Workflow Discovery (WD) in which we are interested in the situation where a formal workflow may not yet exist. Still, we wish to discover the set of actions that have been taken to resolve a particular problem. We also examine a sequence-to-sequence (Seq2Seq) approach for this novel task. We present experiments where we extract workflows from dialogues in the Action-Based Conversations Dataset (ABCD). Since the ABCD dialogues follow known workflows to guide agents, we can evaluate our ability to extract such workflows using ground truth sequences of actions. We propose and evaluate an approach that conditions models on the set of possible actions, and we show that using this strategy, we can improve WD performance. Our conditioning approach also improves zero-shot and few-shot WD performance when transferring learned models to unseen domains within and across datasets. Further, on ABCD a modified variant of our Seq2Seq method achieves state-of-the-art performance on related but different problems of Action State Tracking (AST) and Cascading Dialogue Success (CDS) across many evaluation metrics.
- North America > Canada > Quebec > Montreal (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Workflow (1.00)
- Research Report > New Finding (1.00)